11 research outputs found
Matching neural paths: transfer from recognition to correspondence search
Many machine learning tasks require finding per-part correspondences between
objects. In this work we focus on low-level correspondences - a highly
ambiguous matching problem. We propose to use a hierarchical semantic
representation of the objects, coming from a convolutional neural network, to
solve this ambiguity. Training it for low-level correspondence prediction
directly might not be an option in some domains where the ground-truth
correspondences are hard to obtain. We show how transfer from recognition can
be used to avoid such training. Our idea is to mark parts as "matching" if
their features are close to each other at all the levels of convolutional
feature hierarchy (neural paths). Although the overall number of such paths is
exponential in the number of layers, we propose a polynomial algorithm for
aggregating all of them in a single backward pass. The empirical validation is
done on the task of stereo correspondence and demonstrates that we achieve
competitive results among the methods which do not use labeled target domain
data.Comment: Accepted at NIPS 201
Efficient Minimization of Higher Order Submodular Functions using Monotonic Boolean Functions
Submodular function minimization is a key problem in a wide variety of
applications in machine learning, economics, game theory, computer vision, and
many others. The general solver has a complexity of where is the time required to evaluate the function and
is the number of variables \cite{Lee2015}. On the other hand, many computer
vision and machine learning problems are defined over special subclasses of
submodular functions that can be written as the sum of many submodular cost
functions defined over cliques containing few variables. In such functions, the
pseudo-Boolean (or polynomial) representation \cite{BorosH02} of these
subclasses are of degree (or order, or clique size) where . In
this work, we develop efficient algorithms for the minimization of this useful
subclass of submodular functions. To do this, we define novel mapping that
transform submodular functions of order into quadratic ones. The underlying
idea is to use auxiliary variables to model the higher order terms and the
transformation is found using a carefully constructed linear program. In
particular, we model the auxiliary variables as monotonic Boolean functions,
allowing us to obtain a compact transformation using as few auxiliary variables
as possible
Global structured models towards scene understanding
EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Combining Appearance and Structure from Motion Features for Road Scene Understanding
International audienceIn this paper we present a framework for pixel-wise object segmentation of road scenes that combines motion and appearance features. It is designed to handle street-level imagery such as that on Google Street View and Microsoft Bing Maps. We formulate the problem in a CRF framework in order to probabilistically model the label likelihoods and the a priori knowledge. An extended set of appearance-based features is used, which consists of textons, colour, location and HOG descriptors. A novel boosting approach is then applied to combine the motion and appearance-based features. We also incorporate higher order potentials in our CRF model, which produce segmentations with precise object boundaries. We evaluate our method both quantitatively and qualitatively on the challenging Cambridge-driving Labeled Video dataset. Our approach shows an overall recognition accuracy of 84% compared to the state-of-the-art accuracy of 69%
Image Based Geo-localization in the Alps
Given a picture taken somewhere in the world,
automatic geo-localization of such an image is an extremely useful task especially for historical and forensic sciences, documentation purposes, organization of the world’s photographs and intelligence applications. While tremendous progress has been made over the last years in visual location recognition within a single city, localization in natural environments ismuch more difficult, since vegetation, illumination, seasonal changes make appearance-only approaches impractical. In this work, we target mountainous terrain and use digital elevationmodels to extract representations for fast visual database lookup. We propose an automated approach for very large scale visual localization that can efficiently exploit visual information (contours) and geometric constraints (consistent orientation) at the same time.We validate the system at the scale of Switzerland (40,000 km2) using over 1000 landscape query images with ground truth GPS position
Graph Cut based Inference with Co-occurrence Statistics
Abstract. Markov and Conditional random fields (CRFs) used in computer vision typically model only local interactions between variables, as this is computationally tractable. In this paper we consider a class of global potentials defined over all variables in the CRF. We show how they can be readily optimised using standard graph cut algorithms at little extra expense compared to a standard pairwise field. This result can be directly used for the problem of class based image segmentation which has seen increasing recent interest within computer vision. Here the aim is to assign a label to each pixel of a given image from a set of possible object classes. Typically these methods use random fields to model local interactions between pixels or super-pixels. One of the cues that helps recognition is global object co-occurrence statistics, a measure of which classes (such as chair or motorbike) are likely to occur in the same image together. There have been several approaches proposed to exploit this property, but all of them suffer from different limitations and typically carry a high computational cost, preventing their application on large images. We find that the new model we propose produces an improvement in the labelling compared to just using a pairwise model.
What, Where & How Many? Combining Object Detectors and CRFs
International audienceComputer vision algorithms for individual tasks such as object recognition, detection and segmentation have shown impressive results in the recent past. The next challenge is to integrate all these algorithms and address the problem of scene understanding. This paper is a step towards this goal. We present a probabilistic framework for reasoning about regions, objects, and their attributes such as object class, location, and spatial extent. Our model is a Conditional Random Field defined on pixels, segments and objects. We define a global energy function for the model, which combines results from sliding window detectors, and low-level pixel-based unary and pairwise relations. One of our primary contributions is to show that this energy function can be solved efficiently. Experimental results show that our model achieves significant improvement over the baseline methods on CamVid and PASCAL VOC datasets